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We study the evolution of a random graph under the constraint that the diameter remain constant 
as the graph grows. We show that if the graph maintains the form of its link distribution it must be 
scale-free with exponent between 2 and 3. These uniqueness results may help explain the scale-free 
nature of graphs, of varying sizes, representing the evolved metabolic pathways in 43 organisms. 



In recent years measurements on a wide variety of net- 
works such as the world wide web |l3|,g|, the internet 
backbone M, social networks P,p4 ,E6[ and metabolic net- 
works [Edpa] have shown that they differ significantly 
from the classic Erdos-Renyi model of random graphs 
|l7| . While the traditional Erdos-Renyi model has a Pois- 
son link distribution, with most nodes having a character- 
istic number of links, these networks have scale-free link 
distributions following a power law p{x) ~ x~'^ . To ac- 
count for these observations, the traditional Erdos-Renyi 
growth process [Q has been replaced by newer processes 
P,|6| pO|Jl^ , |2^ relying on the intuitively appealing idea of 
preferential attachment. These processes, have been ex- 
tensively studied in [^ 18 2^. 



While models of preferential attachment provide an 
explanation of the prevalence of scale- free networks, the 
models are largely endogenous and do not take account of 
global exogenous selection pressures which might shape 
the form of evolving and growing networks. Such selec- 
tion pressures would be especially relevant in a biological 
context. Recent measurements of the topological prop- 
erties of graphs representing the metabolic networks of 
43 organisms have demonstrated their scale-free nature 
|jl4| . These metabolic networks are a rare example of dif- 
ferent graphs of varying size shaped by similar selection 
pressures, and allow the testing of explanations for their 
generic features. 

The main selection-based explanation jlj,^ for the 
metabolic network topologies relies on the fact that scale- 
free networks are robust with respect to random mal- 
function of nodes Q. Robustness is identified with the 
diameter of the network, and scale-free networks main- 
tain their diameter when nodes are eliminated at random. 
However, while scale-free graphs are robust in this sense, 
it has not been shown that robust graphs must be scale- 
free. This leaves lingering the question of why metabolic 
networks are scale-free. 

In this paper we consider the evolution of random 
graphs under the constraint that the diameter remain 
constant as the graph grows. We show if the graph main- 
tains the form of its link distribution it must be scale- 
free with exponent between 2 and 3. These uniqueness 
results may help explain the (apparently universal) scale- 
free nature of graphs, of varying sizes, representing the 
evolved metabolic pathways of different organisms. Our 
assumptions and results are consistent with experimental 
findings. 



We first present a brief introduction to the study of 
metabolic networks, review the findings of previous in- 
vestigations, and present the basic definitions necessary 
for the rest of the paper. 

A cell is a complex system composed of numerous or- 
ganic constituents thickly interwoven in a web of reac- 
tions. The processes underlying the life of the cell, which 
include the generation of mass and energy, and infor- 
mation transfer, are a result of this network of complex 
interactions 0. 

Much work has been done on understanding the control 
processes underlying the workings of a cell n,fQ| . How- 
ever there are many open fundamental questions. While 
it is of importance to uncover the fundamental design 
principles underlying the organization of a cell, progress 
in this direction has been limited because of the immense 
complexity and the lack of good abstractions which cap- 
ture certain aspects of the large scale organization. 

One possible abstraction of this web of interactions is 
to represent the gamut of chemical reactions by a graph, 
where each node represents a chemical constituent of the 
cell and a directed edge from one chemical constituent 
A to another constituent B implies that B is a product 
of a reaction between A and other chemical constituents. 
Such a graph representation is referred to as a metabolic 
network. 

Large scale sequencing projects have furnished inte- 
grated pathway-genome databases |§,|l5|,|| from which 
metabolic networks can be inferred. 

Recently, such databases have been used |lj,|2^ to ana- 
lyze the topological properties of the metabolic networks 
of 43 different organisms including E-coli (bacterium) 
and Caenorhibditis elegans (eukaryote). They found re- 
markable similarities in these properties.. In short they 
found that these networks were uniformly scale free with 
exponents between 2 and 3. 

We present now a brief recap of basic definitions nec- 
essary to understand our results. 

A Directed Graph G{V, E) is a collection of points V 
connected by edges E such that each edge points from 
one point to another. We will also use the word node to 
denote a point in the graph. 

The degree of a node is the number of edges attached 
to it. 

The outgoing degree is the number of edges going out 
and the ingoing degree is the number of edges coming in. 

The link distribution p (k) of a graph is the probability 



that a given node chosen at random has k edges going 
into it or going out. Note that there are two different 
hnk distributions ingoing and outgoing. 

The diameter of a graph is the average number of steps 
in takes to go from a node to any other node. 

The nth moment Mn of a distribution p (fc) is defined 
as 



M„ = ^p(A:)P 



(1) 



A metabolic network is a directed graph with nodes 
representing the various chemical species. There is a di- 
rected edge from A to i? if A participates in a reaction 
which leads to B. 

We now proceed to the main result of this paper. We 
study the evolution of a random graph under the con- 
straint that its diameter is constant as the graph grows. 

Let us assume that the outgoing link distribution of 
the growing graph is pk, and it has a variable number 
of nodes denoted by N. If these graphs are random, 
their diameter at a size TV is approximately given by the 
formula llfll 
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log TV 



where the quantity E is: 
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(3) 



The quantity E represents the expected degree following 
a link. From a node A on a graph, the expected degree 
of a node B, found by a random edge traversal, will in 
general be different from the average degree of the graph. 
This is because high degree nodes have more links. Thus, 
following a link chosen at random, the probability that 
the resulting node is a high degree node is higher than if 
the node was chosen at random [|l6| . 

We now impose the constraint that the diameter is 
constant as the graph grows. This amounts to demanding 
that D be independent of A'' in equation H. This would 
be constant with respect to N if the denominator scales 
as logiV, i.e. 

logE = a\ogN (4) 

where a is a constant. Equation (^ implies 

E = N" (5) 

Note that a < 1. This is because no node can have 
degree greater than the total number of nodes, A^" < A^ 
or a < 1. 

Now we have because of (||) and (||) 



f' Pp (k) 

Ml 



N°' 



(6) 



Since A^ is a variable here, we can differentiate both 
sides of (y) with respect to A^ in order to say something 
about the relationship between the various quantities. 
Note that fee is dependent on A^ in some way. This dif- 
ferentiation gives an equation entirely in terms of kc since 
the derivative of the integral depends only on the value 
of the integrand at the endpoints. 

Differentiating the left-hand side of (0) gives 



d fM2\ _ 1 fdM2 
m \M^) ~ M^ \dN 
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(7) 



For a normalizable distribution M2 > M^ by definition. 
In the worst case, when Mi is the largest it can be, 
M2 = Ml. Substituting into ^, the first term is 2^ 
and the second term is ^§ff- Thus, in the worst case, 
neglecting the second term does not change the scaling 
of the left-hand side of (M) with respect to TV. We may 
thus simplify our equation by dropping the derivatives of 
Ml and substituting Mi = a where a is a constant. (In 
most real cases, the average degree (Mi) is only weakly 
dependant on size, making this realistic.) The resulting 
equation is: 



dk 
klp {kc) -g^ = aaN- 



(8) 



To derive the nature of pk we need to know something 
about the dependance of kc on N. We observe that the 
expected degree _E of a node chosen at random will always 
be smaller than the largest degree kc- Also note that the 
largest degree kc will be smaller than the total number of 
nodes A^. Thus kc is bounded below and above by power 
laws 



Ek N"" <kr < N 



(9) 



which means that kc itself will scale with A^ as a power 
law according to some intermediate exponent P as kc = 
bNl^ where a < /3 < 1 

Putting this into (^ we get an equation describing the 
function p{-) in terms of A^. 
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Substituting bN^ — x we get 



p{x) = (aab^^^j — 



(10) 



(11) 



where kc is the largest degree in the graph. 



where 2<7 = 3 — §<3. This shows that under the con- 
straint that a growing graph has a constant diameter the 
probability distribution assumed constant in functional 
form p{x) must have a power law distribution where the 
probability of a node having x links is inversely propor- 
tional to X with an exponent 7 between 2 and 3. 

We next consider our results in the context of evolved 
metabolic networks. Jeong, et. al. |l4] have measured 
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FIG. 1. The diameter D as a function of A*', the size of the 
network. Note that it is almost constant with size. 



link distribution, average degree and diameter for the 
metabolic networks of 43 different organisms. That they 
found the link distributions to be uniformly scale-free 
with exponents between 2 and 3. Furthermore, they 
found that the diameter was constant with respect to 
size (see Figure]^). 

The fact that the metabolic network diameter is con- 
stant across sizes suggests an evolutionary selection pres- 
sure on the organism as it evolved. Our results suggest 
that such a constant diameter constraint (the possible 
biological reasons for which we discuss later) leads to a 
scale-free link distribution with exponents between 2 and 
3, as has been observed. Thus, our results help to explain 
why such networks are likely to be scale-free. 

We now justify the various formulae and assumptions 
used in the derivation. Figure |l| shows the metabolic net- 
work diameters as a function of the size of the network. 
As is evident, the diameter is constant across sizes at 
around 3.4. According to the Newman, et. al. formula 
the diameter for these graphs should be around 3.29 

Since these quantities are close it means that the New- 
man et. al formula (0) applies. 

For further validation of our underlying assumptions, 
Figure || demonstrates that for these data, the cutoff kc 
does scale algebraically with N (consistent with our as- 
sumption that kc « N^). In the figure kc has been ob- 
tained from the graph data by using the average degree 
d and the exponent of the power law 7, which are all 



*This is because the typical cutoff kc for the metabolic net- 
work graphs scales as A'^^ according to the data H (see 
Figure g). Using this fact (Figure |2|) the quantity E (see equa- 
tion (m)) is easily calculated in terms of A'^ for the metabolic 

3-6 _i 2 

networks as E ^ N '< ' . Substituting this into equation 
(|) we get 

+ 1 



D 



^ -1.2 



Substituting 7 « 2.2, consistent with measurements, yields 
D = 3.29. 
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FIG. 2. The quantity 
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versus A'^. The fact that it 



is approximately constant at around 1.2 shows that kc scales 

1.2 
with N as kc — N 1 . 



related by the formula d ~ Y^ kpk « —^ ( 1 



kr 



Figure ^ shows the variation of the average outgoing 
degree for the metabolic networks on a log-linear plot. 
Because the data is approximately linear on a log-linear 
scale, we conclude that the average outgoing degree is 
proportional to log(iV), confirming that the variation of 
the average degree (Mi) is weak as we have assumed in 
the proof. 

Figures |l|-|| show that the formulae and the assump- 
tions used to derive the scale-free nature hold true for 
metabolic networks. Our reasoning for the scale-free link 
distribution observed is consistent with measurements. 

Further we provide some biological reasons as to why 
metabolic networks might be constrained to have small 
constant diameters. With more steps in the network, 
pathways are longer. For A to be converted to B it takes 
more steps. Because the driving force at each reaction 
in the cell gets smaller on average, under this scenario 
(long pathways) the cell would be inefficient at doing its 
job. It has been shown that certain metabolic paths (ie 
routes leading from a chemical A to a chemical B) are 
the shortest possible to carry out a transformation; other 
pathways can be designed, but involve more steps and 
intermediates [121. 



According to Fell |25| minimizing transition times be- 
tween different states, and reducing the time for per- 
turbations to die out, is a consideration. Watts pQ] 
has shown that perturbations die out rapidly in net- 
works with small diameters. Thus having small diameters 
would reduce transition times between different states, 
suggesting a selective advantage to maintaining the net- 
work diameter. 

Another reason for maintaining network diameter re- 
lates to the minimization of metabolite concentrations 
||l[. With hundreds of metabolites in cells, their aver- 
age concentration must be kept low to avoid osmotic and 
solvation problems. Some metabolites are even toxic if 
allowed to accumulate. A small diameter leads to quick 
dissipation. 

We have presented a plausible reason for the scale-free 
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FIG. 3. Showing the average degree vs A'^ on a log- linear 
plot. The figure shows that the average outgoing degree is 
logarithmic with respect to A'^. 



distribution observed in metabolic networks, with our as- 
sumptions and conclusions being consistent with exper- 
iments and with other biological facts. Our argument 
addresses the issue of why robust networks are likely to 
be scale-free. Combined with endogenous models of pref- 
erential attachment, and the error tolerance of scale- free 
networks, our results help explain the prevalence of scale- 
free networks in selective environments. 
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